Supporting On-the-fly Provenance Tracking in Stream Processing Systems

نویسندگان

  • Mark J. Weal
  • Tokuro Matsuo
  • Yongik Yoon
  • Hamad Alsawalqah
  • Sungwon Kang
  • Bashar Al-Shboul
  • Jihyun Lee
  • Mitsuo Wakatsuki
  • Etsuji Tomita
  • Tetsuro Nishino
  • Kunihito Hoki
  • Tomoyuki Kaneko
  • Daisaku Yokoyama
  • Takuya Obata
  • Hiroshi Yamashita
  • Takeshi Ito
  • Roger Y. Lee
  • Chia-Chu Chiang
  • Chisu Wu
  • Jixin Ma
  • Haeng-Kon Kim
  • Dale Karolak
  • Yucong Duan
  • Shaochun Xu
  • John McGregor
  • Pascale Minet
  • Susanna Pelagatti
  • Antoine Bossard
  • Watsawee Sansrimahachai
چکیده

A new class of data management systems that operate on highvolume streaming data is becoming increasingly important. As this kind of systems has to process unpredictable streaming data in real-time and deliver instantaneous responses, it becomes very difficult to precisely validate stream processing results in timely manner, verify stream computation that took place and investigate processing steps used to generate result data. Therefore, a mechanism that can precisely track provenance of data streams at execution time is crucial for confidence in the results produced by this kind of systems. This paper presents a novel on-the-fly stream provenance tracking mechanism that enables a collection of provenance queries to be performed dynamically without requiring provenance information to be stored persistently. The experimental results indicate that the impact of provenance collection on system performance is relatively small (7% overhead observed). In addition, our provenance solution offers low-latency processing (about 0.3 ms per additional component) with reasonable memory consumption.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tracking Stream Provenance in Complex Event Processing Systems for Workflow-Driven Computing

Workflow-driven, dynamically adaptive e-Science is a form of scientific investigation often using a Service-Oriented Architecture (SOA) paradigm, designed to use large-scale computational resources on-the-fly to execute workflows consisting of parallel models, analysis, and visualization tasks. In the Linked Environments for Atmospheric Discovery (LEAD) project, with which our team is involved,...

متن کامل

Advances and Challenges for Scalable Provenance in Stream Processing Systems

While data provenance is a relatively well-studied topic in both the fields of databases and workflow systems, its support within stream processing systems presents a new set of challenges. Given the potentially high event rate of the input streams and the low processing latency requirements imposed by many streaming applications, capturing data provenance effectively in a stream processing sys...

متن کامل

Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering

Data streams flowing from the physical environment are as unpredictable as the environment itself. Radars go down, long haul networks drop packets, and readings are corrupted on the wire. Yet the data driven scientific models and data mining algorithms do not necessarily account for the inaccuracies when assimilating the data. Low overhead provenance collection partially solves this problem. We...

متن کامل

The Case for Fine-Grained Stream Provenance

The current state of the art for provenance in data stream management systems (DSMS) is to provide provenance at a high level of abstraction (such as, from which sensors in a sensor network an aggregated value is derived from). This limitation was imposed by high-throughput requirements and an anticipated lack of application demand for more detailed provenance information. In this work, we firs...

متن کامل

Assessing the Trustworthiness of Streaming Data

The notion of confidence policy is a novel notion that exploits trustworthi-ness of data items in data management and query processing. In this paper we address the problem of enforcing confidence policies in data stream management systems (DSMSs), which is crucial in supporting users with different access rights, processing confidence-aware continuous queries, and protecting the secure streami...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014